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-ABSTRACT^ 

Genetic chaqges underlie tumor progression and may lead to cancer- 
' specific expression of critical genes. Over UOQ .publications have de-.. 
; scribed the use of comparative ^enomJc hybridization (CGH) to analyze 

tht pattern of copy number alterations in cancer, but very few of the genes 

affected are known. Here, we performed high-resolution CGH analysis on 

d)NA mkroarrays in breast cancer and directly compared copy number 

and mRNA expression levels of 13 f 824.genes -to quantitate the impact of 

genomic changes on gene expression. We identified and mapped the 

boundaries of 24 independent amplicons, ranging in size from OJ to 12 

Mb, Throughout the genome, both high- and low-level copy number 

«-hsnges had a substantial impact on gene expression. With '44% of the 

highly amplified genes showing overexpression and. it>S% of the highly 

overexpressed genes being amplified. Statistical analysis- with random 

permutation tests identified 270 genes whose expression levels across' 14 

samples were systematically attributable to gene amplification., l^ese 

included most previously described amplified genes, in breast cancer and 

many novel targets for genomic' alterations, including the BOXB7 gene, 
.the presence of which in a novel ampiicon at 17q213 was validated in 

10J% of primary breast cancers and associated with poor patient prog- 
nosis. In conclusion, CGH on cDNA mkroarrays revealed hundreds of. 

novel genes whose overexpression Is attributable to gene amplification. 

These genes may provide insights to the clonal evolution and progression 

.of breast cancer and highlight promising therapeutic targets. 

INTRODUCTION 

Gene expression patterns revealed by cDNA microarrays have 
facilitated classification of cancers into biologically distinct catego- 
ries; some of which may explain the clinical behavior of the tumors 
(1-6). Despite this progress in diagnostic classification, the molecular 
mechanisms underlying gene expression patterns in cancer have re- 
maine^elusive, and the utility of gene expression profiling in the 
identification of specific, tfierapeufic targets l^ains ImiJtear^ m * 
• "Accumulation of genetic defects is thought to underlie the clonal 





■ Fig. J. Impact of gene copy number on global gene expression levels. A. percentage of 
over- and underexpiessed^genes {7 axis) according to copy.mjmbVfcatios QC eodsi 
Threshold values used for Over- and undercxpreasfon were >2J84 (global tipper 7% of 
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evolution of cancer. Identification of the genes that mediate (he effects 5? cDNA rati(W ) *** «M826 (global lower 7H of the expression ratios), s, percentage 
of genetic changes may be important by highlighting transcripts that °^^n ^^^^^<t^ m *** ***** ^ fw • 
are actively involved in tumor progression. Such transcripts and their 
enc^ed proteins would be ideal targets. for anticancer therapies, as 
demonstrated by the clinical success of new therapies against ampli- 
fied oncogenes, such as ERBB2 andEGFR(7 f S), in breast cancer and 
other solid tumors. Besides amplifications of known oncogenes, oyer 
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20. recurrent regions of DNA ampiification have been mapped in 
breast cancer by CGH 5 . (9, 10).. However, these amplicons are often 
large ana* poorly defined, and meir impact oh gene expression remains 
unknown. 

We hypothesized that genome-wide identification of those gene 
expression changes that , are attributable to underlying gene copy' 
number alterations would highlight transcripts that are. actively in- 
volved in the causation or maintenance of the malignant phenorype. 
To identify such transcripts, we applied a combination of cDNA and 
CGH microarrays to: (c) determine the global impact that gene copy 
number variation plays in breast cancer develop'mentand progression; 
and (b) identify and characterize those genes whose mRNA expres- 
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5 The abbreviations used are: CGH, comparative genomic hybridization; FISH, fluo- 
rescence tn sifu hybridization; RT-PCR, reverse transcription-FCR. . 
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Fig. Z Genome-wide copy number and .expression analysis in the MCP-7 breast cancer cell line. A. chromosomal CGH analysis ofMCF-7. The copy number ratio profile (blue 
line} across die entire genome from lp telomere to Xq telomere is shown along with ± 1 SD {orange /Dies). The black horizontal line indicates a ratio of 1 .0; red line, a ratio of 0 8- 
and «««16ift a ratio of 12. B-Q genome-wide copy number analysis in MCF-7 by CQH on cDNA microaxray. The copy rmmber'ratios were plotted as a function of the pcetton 
of the cDNA cJones along the human genome. In B, individual data points are connected with a line, and a moving median of 10 adjacent clones is shown, ked horizontal jKTthe 
copy number ratio of 1.0. In C, individual data points are labeled by color coding according to cpNA expression ratios. The bright red dots indicate the uppcrtft, and dark red dots 
the nest 5% of the expression ratios in MCF-7 cells (overexpressed genes); bright green dots indicate the lowest 2%, and dark green dots, the next 5% of the expression ratios 
(unocreyraued genes); the rest of the observations are shown with black crosses. The chromosome numbers arcj shown at the bottom of the figure, and chromosome boundaries are 
m dict t ed with a dashed line. , ' . 



sion is most significantly associated with amplification of the corre- 
sponding genomic template, 

MATERIALS AM) METHODS * 

Breast Cancer CeJl limes. Fourteen breast cancer cell lines (BT-20, BT- 
474,HCC1428,Hs578t, MCF7, MbA-361, MDAT43.6, MDA-453, MDA-468, 
SrCBR-3, T-47D, UACC812, ZR-7S-1, and ZR-75-30) were obtained ftomtne' 
American Type Culture Collection (Manassas, VA). Cells were grown under 
rec<nnmended culture conditions. Genomic DNA and raRNA were isolated 
using standard protocols. 

Copy Number and Expression Analyses by cDNA Microarrays. The 
preparation and printing of the 13,824 cDNA clones on glass slides were 
performed as described (1 1-13), Of these clones, 244 represented uncharac- 
terized expressed sequence tags, and the remainder corresponded to known -f 
genes. CQH experiments on cDNA mjcroarrays were done si described (14, . 
15). Briefly, 20 ng of genomic DNA from breast cancer ceil lines and normal 
human WBCs were digested for 14-18 h with Ahtl and teal (Life Technol- 
ogies, Inc., Rockville, MD) and purified by phenol/chloroform extraction. Six 
Mg of digested cell line DNAs were labeled with CyS-dUTP (Amersham where m gli a sl and cr^ denote the means and SDs for the expression 
Pharmacia) and normal DNA with Cv5h1UTP (Amersham Pharmacia) using ■ levels for amplified and nonamplified ceil lines, respectively. To assess the 
the Biopnrae Labeling kit (Life Technologies, inc.). Hybridization (14, 15) and. .statistical significance of each weight; we performed 10,000 fandWpermu- 
posthybridization washes (13) were done as/described. For the expression tations of the label vector. The probability that a gene' had a larger or equal 
analyses, a standard reference (Universal Human Reference RNA; Stratagene, weight by random permutation man the original weigjbt was .denoted by a. A 
La Jolla,' CA) was used in all experiments. Forty /tg of reference RNA were low a (<0.05) indicates a strong association between gene expression and 
labeled .with Cy3-dUTP and 3.5 fig of r^. mRNA with (^S^irTP, and the amplification. 



were excluded from the analysis and were treated as missing values. The 
distributions of fluorescence ratios were used, to define outpoints tor increased/ 
decreased copy number.. Genes with CGH ratio >M3 (representing the upper 
5% of the CGH ratios across all experiments) were considered to be amplified, 
and genes with ratio <p;73 (representing the lower 5%) were considered to be 
deleted. 

Statistical Analysis of CGH and cDNA Microarray Data. To' evaluate 
the influence of copy number alterations on gene expression, we. applied the 
following statistical approach. CGH and cDNA calibrated intensity ratios were 
log-transformed and normalized using median centering of the values in each 
cell line. Furmermore, cDNA ratios for each gene across all 14 cell lines were 
median centered. .For each gene, the CGH data were represented by a vector 
mat was labeled 1 for amplification (ratio, > 1 .43) and 0 for no amplification. 
Amplification was correlated with gene expression using the signal-to-noise 
statisjxes (1). We calculated a weight, w r fm each gene as^oUowa: 



'..<r.t t- a, 



so 



labeled cDNAs were hybridized on nticroarrays as described (13, 15). For both 
microarray analyses, a laser confocat scanner (Agilent Technologies, Palo 
Alto, CA) was used to measure the fluorescence intensities at the target 
locations using the DBARRAY software (16). After background subtraction, 
avenge intensities at each clone in the test hybridization were divided by the 
average intensity of the corresponding clone in the control hybridization. For 
the copy number analysis, the ratios were normalized on the basis of the 
distribution of ratios of all" targets on the array and for the expression analysis 



Genomic Localization, of cPNA Clones and Ampllcoir Mapping* Each 
cDNA clone on the microarray was assigned to a Unigene cluster using the 
Unigene Build 14 l. a A database of genomic sequence alignment information 
for mRKA sequences was created from the August 2001 freeze of the Uni- 
versity of California Santa Cruz's GoldenPath database. 7 The chromosome and 
bp positions for each cDNA clone were then retrieved by relating these data 
sets. Amplicons were defined as a CGH copy number ratio >2.0 in.at least two 
adjacent clones in two or more cell lines or a CGH ratio >2.0 in at least three 
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on the basis of 6*8 housekeeping genes, which were spotted four times onto the .adjacent clones in a single cell line. The amplicon start and end positions were 
array. Low quality measurements (Le., copy.number data with mean reference . ' ■ 

intensity <I00 flimrescent units, and expression data with both test , and <ir^er address: lit^/rcsearch^^ cdaa.himl. 
reference intensity <100 fluorescent units anoVor. with spot size <50 units) 7 Internet address: w^.gencmKj.ucst.edu. ■ 
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™e 1 Summary qf independent amplicons In U breast cancer cell lines by 
CGH microarray 
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CGH were validated, with lq21, 17ql2-q21.2j 17q22^q23, 20ql3.1, 
and 20ql3.2 regions being most commonly amplified Furthermore, 
the boundaries of these amplicons were precisely delineated* In ad- 
dition, novel amplicons v were identified at 9pl3 (38.65-39.25 Mb)/ 
^4^213(52.47-55,801^)). 

Direct . Identification of Pntatiye Amplification Target Genes. 
The cDNA/CGH microarray technique enables the direct correla- 
tion of copy number and expression data on a gene-by-gene basis 
throughout the genome. We directly annotated, high-resolution 
COH plots with gene expression data using color coding. Fig. 2C 
shows that most of the amplified genes in the MGF-7 breast cancer 
cell line at lpl3, 17q22-q23, and 20ql3 were' highly overex- 
pressed. A view of chromosome 7 in the cell line 
implicates EGFR as. the most highly overexpressed and amplified 
gene at 7pll-pl2 (Fig.* 34). In BT-474, the two known amplicons 
at 17ql2.and 17q22-q23 contained numerous highly overex- 
pressed genes (Fig. 39). In addition, several gene's,, including the 
homeobox genes HOXB2 und HOXB7 t were highly amplified in a 
. previously undescribed ; independent amplicon.atl7q2i.3. HOXB7 
was systematically amplified (as validated by. FISH, Fig. 3B, inset) 
as well as overexpressed (as verified .by RT-PCR, data not shown) 
in BT-474, UACC8 12, and ZR-75-30 cells. Furthermore, this novel. 



extended to include neighboring nonamplified clones (ratio, <1.5). The am- 
plicon size determination was pardally dependent on local clone density. 

FISH. Dual-color interphase FISH to breast cancer cell lines was done' as 
described (17), Bacterial artificial chromosome clone RP1 1-36 1K8" was la-' 
bded with SpcctrumOrange (Vysis, Downers Grove, IL), and Spcctrum- 
Orange-labcled probe for EGFR was obtained from Vysis. SpectnimGreen- 
labeled chromosome 7. and 17 centromere probes (Vysis) were used as a 
reference. A tissue microarray containing 612 formalin-fixed, paraffin-embed- - 
ded primary breast cancers (17) was applied in FISH analyses as described 
> (18)- The use of these specimens was approved by (he Ethics Committee of the 
University of Basel and by -the NIH. Specimens containing a 2-rold or higher' 
increase in the number of test probe signals, as compared with corresponding 
centromere signals, m at least 10% of the tumor cells were considered to be 
amplified. Survival analysis was performed using the Kaplan-Meier method 
and the log-rank test 

RT-PCR. The HOXB7 expression level was determined relative to 
GAPDH. Reverse transcription and PGR amplification were performed using 
Access RT-PCR Sysfcm (Promega Corp., Madison, W0 with 10 ng ofmRNA 
as a template. HOXB7 primers were 5'-GAGCAGAGGGACTC(XjACTT-3' 

and 5'-GCGTCAdOTAGCCATTGTAO-3'. 

*i • ■ .' . v . 

RESULTS. ** 

Global Effect of Copy Number, on Gene Expression. 13,824 
arrayed cDNA clones were applied for analysis of gene expression 
and gene copy number (CGH microarray s) in 14 breast cancer cell 
lines. The results illustrate a considerable influence of copy number 
on gene expression patterns. Up to 44% of the highly amplified 
transcripts (CGH ratio, >2.5) were overexpressed (r.e\, belonged to 
the global upper 7% of expression ratios), compared with only 6% for 
genes with normal copy number Jeyels (Fig/IA). Conversely, .10.5% 
of the transcripts, with high-level expression (cDNA ratio, >10) 
showed, increased copy number (Fig. IS). Low-level copy number 
increases and decreases were also associated with similar, although 
less dramatic, outcomes on gene expression (Fig.* 1). 

Identification of Distinct Breast Cancer Amplicons. Base-pah* 
locations obtained for 1 1,994 cDNAs (86.8%) were used to plot copy 
number changes as a function of genomic position (Fig. 2, Supple- . 
ment Fig. A). The average spacing of clones throughout' the genome 
was 267 kbv This high-resolution mapping identified 24 independent 
breast cancer amplicons, spanning from 012 to-12 Mb of DNA (Table 
1). Several amplification sites detected previously by chromosomal 
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Pig. 3. Annotation of gene expression data on CGH microarray profiles. A, genes in (he 
7pl 1 -pl2 amplicon in the MDA-468 cell line are highly expressed (red Hots) and include 
. the EGFR oncogene. A several genes in the I7ql2, 17q21,3, and 17q23 amplicons in the 
BT-474 breast cancer cell line are highly overexpressed (red) and include the HOXB7 
gene. The - data labels and color coding are as indicated for Fig, 2C Insets show 
chromosomal CGH profiles for the corresponding • chromosomes and validation of the 
increased copy number by interphase FISH using EGFR {red) and chtornosoaie 7 
centromere probe (green) to MDA-468. W. and H0XB7-speciRc probe (red) and chro- 
mosome 17 centromere (green) to BT-474 cells 
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fa 4. List of SO vUh * statistically 
sifiplficant ccn*laaoa(<* value <a05) tcfcwen 
copy number md gene expression. Name 
chranesotnal location, and the a value for each 
.gem trc indicated. The tents have been ordered 
. according to ifcfr portion fa the genome. The color 
maps on (he rfgA/ illustrate He copy .number and 
. cotrveaJon ratio patterns id the H cell line*. The 
key to the color code hite?#n*t the bono* of the 
graph- Grayitfuorcr. missbg values. The complete 
list of 270 genes is shown in supplemental Fig. fi 
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amplification was valxdatedto be present in 10.2% of 363 primary 
breast cancers by FISH to a tissue microarray and was associated 
with poor prognosis of the patients '(P = O^OOI). 

Statistical Identification and Characterization of 270 Highly 
Expressed Genes 4a Amplicons. Statistical comparison of express 
sion levels of all genes as a Amotion of gene amplification identified 
270 genes whose expression was significantly influenced by copy 
number across all 14 cell lines (Fig. 4, Supplemental Fig. B). Accord- 
ing to the gene ontology data, 8 91 of the 270 genes represented 
hypothetical proteins or genes with no functional annotation, whereas 
I79.had associated functional information available. Of these, 151 
(84%) are implicated in apopt'osis, cell proliferation, signal transdno* 
tion, and transcription, whereas 28 (16%) had functional annotations 
that could not be directly linked with cancer. . 



* Internet address; hupy'/www.gcneofitoJogy.ois/. 



DISCUSSION 

The importance of recurrent gene and chromosome copy number 
changes in the development and progression of solid tumors has been 
characterized in >1000 publications applying CQH 9 (9, 10), as well 
as in a large number of other molecular cytogenetic, cytogenetic, and 
molecular genetic studies. The effects of these somatic genetic* 
changes on gene expression levels have remained largely unknown, • 
although a few studies have explored gone expression changes occur- 
ring in spccificamplicoas (15, 19-21). Here, we applied genome- 
wide cDNA microarrays to identify transcripts whose expression 
changes were attributable to underlying gene copy number alterations 
in breast cancer. 4 . 

The overall impact of copy number on gene expression patterns was 
substantial with the most dramatic effects seen in the case of high- 

* Internet address: tu^/www.iKbUbuiib^ov^mie2. ' 



GBNB .EXPRESSION PATTERNS IN. BREAST CANCER '.' 

Jwl copy number increase. Low-level copy number gams and losses between HOXB7 amplification and poor patient prognosis. Overall 

also had a significant influence on expression levels of genes in (he our results illustrate how the identification of genes activated by 

regions affected, but these effects were more subtle on a gene-by-gene gene amplification provides a powerful approach to highlight 

basis than those of high-level amplifications. However, the impact of genes with an important role in cancer as, well as to'prioritize and 

low-level gains on the dysregulation of gene expression patterns in ' validate putative .targets for therapy development 
cancer may be equally important if not more important than mat of 

high-level amplifications. Anenploidy and low-level gains and losses • 

of chromosomal arms represent the most common types of genetic REFERENCES - 
alterations in breast and other cancers and, therefore, have an iriflu- ^ T. IL, siouim, d. IC, Tamayo, P. Hoard, a, Gaaseuc^ m, Meauov I P 
ence on many genes. Our results in breast cancer extend the recent Coller, rL, Loh, m. l„ Downing, J. jl, CaUgiuri, wTZmboinfidd, C. DV'and 
, studies on the . impact of anenplbidy on global gene expression oat- Under, H. S. Molecular classification of cancer: clasidscbvcy and class prediction 

model System <22-?4). Bc4dxfck,J.C,Sebct i a i Traii f T,yo i X,e/ct Disth^^ofdiilur^BlccU 

Hie COH microarrny analysis identified 24 independent breast. ^ homa {dvmcd * «P»«aI(m profit Natatt'fjLoiuLx 403: 503-511, 




proximity to other larger atapHcons/ One of these novel amplicons 
involved the homeobox gene regipn at 17q2U and led to the over- 
expression of the HOXB7 HOXB2 genes. The homeodomain 
transcription factors are T known to be key regulators of embryonic 
development and have been occasionally reported to undergo aberrant 
expression in cancer (27, 28). HOXB7 transaction induced cell pro- 
liferation in melanoma, breast, and ovarian cancer cells and increased 
tumorigenicity and angiogenesis in breast cancer (29-32)., The pres- 
ent results .imply that gene amplification may be a prominent mech- 
anism for overexpressing HOXB7 in. breast cancer and suggest that 
HOXB7. contributes to tumor progression and confers an aggressive 
disease phenotype in' breast cancer. This view is supported by our' 
rinding of amplification of JJQXB7 in 10% of 363 primary breast' 
cancers, as well as an association of amplification With poor prognosis, 
of the patients. 

• We carried out a systematic search to identify genes whose 
expression levels across all . 14 cell lines were attributable to 
amplification status.. Statistical analysis revealed 270 such-genes 
(representing ~2% of all genes pn the array), including not only 
.previously , described amplified genes, such as HEH-2, MYC 9 
EGFR t ribosomal protein s6 kinase, and AIB3 t but also numerous 
novel genes such as NRAS-related gene (lpl3), syndecan-2 (Sq22), 
and bone morphogenfc protein (20ql3.1), whose activation by 
amplification njay similarly promote, breast cancer progression. . 
Most of the 270; genet have not been implicated previously in 
breast cancer development and suggest novel pathogenetic mech- 
anisms. Although we would not expect all of them to be causally 
involved, it is. intriguing that 84% of .the genes with associated 
functional information were implicated in ajpoptosis, cell prolifer- 
ation, signal transduction, transcription, or other cellular processes 
that could directly imply a possible role in cancer progression. 
Therefore, a detailed characterization of these genes may provide 
biological insights to breast cancer progression and might lead to 
the development . of novel therapeutic strategies. 

In summary, we.dempnstrate application of cDNA microarrays 
to the analysis of both copy number and expression levels of over 
12,000 transcripts throughout the breast cancer genome, roughly 
once* every 267- kb. This analysis provided: (a) evidence of a 
prominent global influence of copy number changes on" gene 
expression levels; (b) a high-resolution map. of 24. independent 
amplicons in breast cancer; and (c) identification of a set of 270 
genes, the ovefexpression of which was statistically attributable to • 
gene amplification; Characterization of a novel amplicon at 
!7q21.3 implicated amplification and oyerexpression . of the 
JTOXB7 gene in breast cancer* including a clinical association 
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